Skip to content

Conversation

@mykaul
Copy link

@mykaul mykaul commented Aug 11, 2025

Pre-review checklist

  • I have split my patch into logically separate commits.
  • All commit messages clearly explain what they change and why.
  • I added relevant tests for new features and bug fixes.
  • All commits compile, pass static checks and pass test.
  • PR description sums up the changes and reasons why they should be introduced.
  • I have provided docstrings for the public items that I want to introduce.
  • I have adjusted the documentation in ./docs/source/.
  • I added appropriate Fixes: annotations to PR description.

…code paths.

Minor perf. improvement (very) if compression is used - or not: do not call len() for the buffer redundantly.

Signed-off-by: Yaniv Kaul <[email protected]>
@mykaul mykaul requested a review from Copilot August 11, 2025 15:59
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR optimizes compression paths by eliminating redundant len() calls on data that will be compressed. The changes store the length of data in variables before compression and pass it as a parameter to compression functions, avoiding multiple length calculations on the same data.

  • Modified compression function signatures to accept a length parameter
  • Updated compression logic to pass pre-calculated lengths instead of recalculating after compression
  • Fixed a typo in a docstring ("one of more" → "one or more")

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
cassandra/connection.py Updated LZ4 and Snappy compression functions to accept length parameter
cassandra/protocol.py Modified message encoding to pre-calculate body length and pass to compressor
cassandra/segment.py Updated segment compression to use pre-calculated lengths and fixed compression logic
tests/unit/test_segment.py Updated test cases to use new compression function signature

In essence, we know the length of the uncompressed payload, so pass it along.
To do so, changed the signature of the function (and did the same as dummy for snappy).

Signed-off-by: Yaniv Kaul <[email protected]>
@mykaul mykaul force-pushed the remove_len_from_compress branch from 0244246 to 078a131 Compare August 11, 2025 16:06
@mykaul
Copy link
Author

mykaul commented Aug 11, 2025

To my surprise, very similar code exists in GoCQL....

@mykaul
Copy link
Author

mykaul commented Aug 11, 2025

CI failure looks unrelated (especially as no compression was involved, but who knows):

=================================== FAILURES ===================================
______ LightweightTransactionTests.test_no_connection_refused_on_timeout _______

self = <tests.integration.standard.test_query.LightweightTransactionTests testMethod=test_no_connection_refused_on_timeout>

    def test_no_connection_refused_on_timeout(self):
        """
        Test for PYTHON-91 "Connection closed after LWT timeout"
        Verifies that connection to the cluster is not shut down when timeout occurs.
        Number of iterations can be specified with LWT_ITERATIONS environment variable.
        Default value is 1000
        """
        insert_statement = self.session.prepare("INSERT INTO test3rf.lwt (k, v) VALUES (0, 0) IF NOT EXISTS")
        delete_statement = self.session.prepare("DELETE FROM test3rf.lwt WHERE k = 0 IF EXISTS")
    
        iterations = int(os.getenv("LWT_ITERATIONS", 1000))
    
        # Prepare series of parallel statements
        statements_and_params = []
        for i in range(iterations):
            statements_and_params.append((insert_statement, ()))
            statements_and_params.append((delete_statement, ()))
    
        received_timeout = False
        results = execute_concurrent(self.session, statements_and_params, raise_on_first_error=False)
        for (success, result) in results:
            if success:
                continue
            else:
                # In this case result is an exception
                exception_type = type(result).__name__
                if exception_type == "NoHostAvailable":
                    pytest.fail("PYTHON-91: Disconnected from Cassandra: %s" % result.message)
                if exception_type in ["WriteTimeout", "WriteFailure", "ReadTimeout", "ReadFailure", "ErrorMessageSub"]:
                    if type(result).__name__ in ["WriteTimeout", "WriteFailure"]:
                        received_timeout = True
                    continue
    
                pytest.fail("Unexpected exception %s: %s" % (exception_type, result.message))
    
        # Make sure test passed
>       assert received_timeout
E       assert False

tests/integration/standard/test_query.py:957: AssertionError

mykaul added a commit to mykaul/gocql that referenced this pull request Aug 11, 2025
Very similar changes to the Python driver (scylladb/python-driver#520 ),
there are redunandant calls to len(). Removed them.

Follow-up patch will hopefully change the Encode() functions to accept the length of the buffer,
to remove another len() call (just as done in the Python change mentioned above).

Signed-off-by: Yaniv Kaul <[email protected]>
mykaul added a commit to mykaul/gocql that referenced this pull request Aug 11, 2025
Very similar changes to the Python driver (scylladb/python-driver#520 ),
there are redundant calls to len(). Removed them.

Follow-up patch will hopefully change the Encode() functions to accept the length of the buffer,
to remove another len() call (just as done in the Python change mentioned above).

Signed-off-by: Yaniv Kaul <[email protected]>
@dkropachev
Copy link
Collaborator

I don't think it makes sense to complicate API by saving couple of CPU ticks, if anything this optimization should be done by python or cython.

@mykaul
Copy link
Author

mykaul commented Aug 12, 2025

I don't think it makes sense to complicate API by saving couple of CPU ticks, if anything this optimization should be done by python or cython.

I'm not sure what API I'm changing here (it's never clear to me what is considered an API in our Python driver - documentation is poor). I think a couple of CPU ticks + improved readability (IMHO) is a good direction.
The compiler is supposed to optimize good code, not mediocre code.

Nevertheless, if you don't agree with the approach, OK. Closing.

@mykaul mykaul closed this Aug 12, 2025
@roydahan
Copy link
Collaborator

We could bypass the "API" breakage, by setting a default and/or handling for missing values for each new parameter that was introduced to the these 2 functions.

@dkropachev
Copy link
Collaborator

We could bypass the "API" breakage, by setting a default and/or handling for missing values for each new parameter that was introduced to the these 2 functions.

The problem is not in the breaking API, but rather in complicating it without good reason.

Also this API ought to be changed to be streaming.

@mykaul
Copy link
Author

mykaul commented Aug 13, 2025

We could bypass the "API" breakage, by setting a default and/or handling for missing values for each new parameter that was introduced to the these 2 functions.

The problem is not in the breaking API, but rather in complicating it without good reason.

Also this API ought to be changed to be streaming.

I don't fully understand what API you refer to. I don't think I've touched any public API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants